3 research outputs found
Learning Rewards from Linguistic Feedback
We explore unconstrained natural language feedback as a learning signal for
artificial agents. Humans use rich and varied language to teach, yet most prior
work on interactive learning from language assumes a particular form of input
(e.g., commands). We propose a general framework which does not make this
assumption, using aspect-based sentiment analysis to decompose feedback into
sentiment about the features of a Markov decision process. We then perform an
analogue of inverse reinforcement learning, regressing the sentiment on the
features to infer the teacher's latent reward function. To evaluate our
approach, we first collect a corpus of teaching behavior in a cooperative task
where both teacher and learner are human. We implement three artificial
learners: sentiment-based "literal" and "pragmatic" models, and an inference
network trained end-to-end to predict latent rewards. We then repeat our
initial experiment and pair them with human teachers. All three successfully
learn from interactive human feedback. The sentiment models outperform the
inference network, with the "pragmatic" model approaching human performance.
Our work thus provides insight into the information structure of naturalistic
linguistic feedback as well as methods to leverage it for reinforcement
learning.Comment: 9 pages, 4 figures. AAAI '2
Words are all you need? Capturing human sensory similarity with textual descriptors
Recent advances in multimodal training use textual descriptions to
significantly enhance machine understanding of images and videos. Yet, it
remains unclear to what extent language can fully capture sensory experiences
across different modalities. A well-established approach for characterizing
sensory experiences relies on similarity judgments, namely, the degree to which
people perceive two distinct stimuli as similar. We explore the relation
between human similarity judgments and language in a series of large-scale
behavioral studies ( participants) across three modalities (images,
audio, and video) and two types of text descriptors: simple word tags and
free-text captions. In doing so, we introduce a novel adaptive pipeline for tag
mining that is both efficient and domain-general. We show that our prediction
pipeline based on text descriptors exhibits excellent performance, and we
compare it against a comprehensive array of 611 baseline models based on
vision-, audio-, and video-processing architectures. We further show that the
degree to which textual descriptors and models predict human similarity varies
across and within modalities. Taken together, these studies illustrate the
value of integrating machine learning and cognitive science approaches to
better understand the similarities and differences between human and machine
representations. We present an interactive visualization at
https://words-are-all-you-need.s3.amazonaws.com/index.html for exploring the
similarity between stimuli as experienced by humans and different methods
reported in the paper.Comment: Fixed fonts in Figure